Skip to content

Fix ASAN build clobbering packages after build on the same system#13

Open
croos12 wants to merge 7 commits intomasterfrom
croos-asan-build
Open

Fix ASAN build clobbering packages after build on the same system#13
croos12 wants to merge 7 commits intomasterfrom
croos-asan-build

Conversation

@croos12
Copy link
Owner

@croos12 croos12 commented Mar 4, 2026

Depends on:
croos12/sonic-sairedis#10
croos12/sonic-swss#6

Why I did it

ASAN-instrumented deb packages and Docker images had the same filenames as regular builds, making it impossible to build ASAN on top of a regular build without clobbering and reusing artifacts, such as swss deb and syncd deb that need to be rebuilt for ASAN. Additionally, ENABLE_ASAN was not exported during Docker image builds, so ASAN_OPTIONS were never set in supervisord.conf at runtime.

How I did it

  • Renamed ASAN deb targets with -asan suffix (e.g., swss-asan_1.0.0_amd64.deb) using a new _DPKG_DEB_NAME property to map dpkg output names to renamed Make targets in slave.mk.
  • Renamed ASAN Docker images with -asan suffix (e.g., docker-orchagent-asan.gz, docker-syncd-mlnx-asan.gz).
  • Added debug symbol packages (LIBSWSSCOMMON_DBG, LIBSAIREDIS_DBG, LIBSAIMETADATA_DBG) to ASAN Docker image dependencies for complete stack traces.
  • Removed redundant $(ENABLE_ASAN) from _DEP_FLAGS since distinct filenames now create separate cache entries.
  • Exported ENABLE_ASAN in the Docker image build recipe so Dockerfile.j2 can pass it through to docker-init.sh and supervisord.conf.

How to verify it

  1. Build regular then ASAN: both artifacts coexist in target/ (docker-orchagent.gz and docker-orchagent-asan.gz).
  2. On device inside swss container:
docker exec -it swss bash
ldd /usr/bin/orchagent | grep asan
	libasan.so.8 => /lib/x86_64-linux-gnu/libasan.so.8 (0x00007efca6d19000)
grep ASAN /usr/bin/docker-init.sh
     -a "{\"ENABLE_ASAN\":\"y\"}" \
sonic-cfggen -y /etc/sonic/sonic_version.yml -v asan
yes

Verify ASAN is active at runtime:

export ASAN_OPTIONS="detect_leaks=1:log_path=/var/log/asan/test:verbosity=2"
supervisorctl stop orchagent
/usr/bin/orchagent.sh &
sleep 5
kill -TERM $(pidof orchagent)
ls -la /var/log/asan/
test.log.5095

Tested branch (Please provide the tested image version)

Description for the changelog

Fix ASAN build clobbering packages after build on the same system

Signed-off-by: Connor Roos <croos@nvidia.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the SONiC build system to better support AddressSanitizer (ASAN) builds alongside regular builds by introducing ASAN-suffixed artifact naming and adjusting how debs/docker images are produced and cached.

Changes:

  • Rename key deb/docker outputs for ASAN builds (e.g., swss-asan, syncd-asan, docker-orchagent-asan, docker-syncd-*-asan) to avoid collisions with regular build artifacts.
  • Teach the dpkg deb move step to support renaming via *_DPKG_DEB_NAME, enabling “target name != dpkg output filename”.
  • Update several *_DEP_FLAGS definitions to drop $(ENABLE_ASAN) so dependency/cache keys align with the new artifact naming approach.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
slave.mk Adds deb renaming support via *_DPKG_DEB_NAME during artifact move; exports ENABLE_ASAN for docker Jinja rendering.
rules/sysmgr.dep Removes $(ENABLE_ASAN) from sysmgr dep flags.
rules/syncd.mk Introduces ASAN-suffixed syncd deb names and dpkg output name mapping.
rules/syncd.dep Removes $(ENABLE_ASAN) from syncd dep flags.
rules/swss.mk Introduces ASAN-suffixed swss deb names and dpkg output name mapping.
rules/swss.dep Removes $(ENABLE_ASAN) from swss dep flags.
rules/docker-orchagent.mk Adds ASAN-suffixed orchagent docker image name and extra deps when ASAN enabled.
rules/docker-orchagent.dep Removes $(ENABLE_ASAN) from orchagent docker dep flags.
platform/template/docker-syncd-bookworm.mk Adds ASAN-suffixed syncd docker base image naming.
platform/mellanox/docker-syncd-mlnx.mk Adds extra dbg deps for ASAN syncd docker image.
platform/mellanox/docker-syncd-mlnx.dep Removes $(ENABLE_ASAN) from Mellanox syncd docker dep flags.

@croos12 croos12 changed the title Set ASAN build to work on top of regular build Fix ASAN clobbering after build on the same system Mar 4, 2026
@croos12 croos12 changed the title Fix ASAN clobbering after build on the same system Fix ASAN build clobbering packages after build on the same system Mar 4, 2026
@vivekrnv
Copy link

vivekrnv commented Mar 5, 2026

I suppose this also must be changed, we need to add extra package here swss-asan, syncd-asan target

https://github.com/sonic-net/sonic-swss/blob/master/debian/control

https://github.com/sonic-net/sonic-sairedis/blob/master/debian/control

@croos12
Copy link
Owner Author

croos12 commented Mar 5, 2026

@copilot make the change vivek asked for

Signed-off-by: Connor Roos <croos@nvidia.com>
@croos12 croos12 marked this pull request as ready for review March 6, 2026 17:38
croos12 added 2 commits March 6, 2026 18:04
Signed-off-by: Connor Roos <croos@nvidia.com>
Signed-off-by: Connor Roos <croos@nvidia.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated 3 comments.

Signed-off-by: Connor Roos <croos@nvidia.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 12 out of 12 changed files in this pull request and generated no new comments.

croos12 added 2 commits March 10, 2026 17:56
Signed-off-by: Connor Roos <croos@nvidia.com>
Signed-off-by: Connor Roos <croos@nvidia.com>
@vivekrnv
Copy link

LGTM, do you have sonic-swss and sonic-sairedis PR's?

@croos12
Copy link
Owner Author

croos12 commented Mar 14, 2026

LGTM, do you have sonic-swss and sonic-sairedis PR's?

Added them to the description

croos12 pushed a commit that referenced this pull request Mar 19, 2026
…net#25643)

* [build] Add build timing report and dependency analysis tools

Add three scripts for build performance instrumentation:

- scripts/build-timing-report.sh: Parse per-package timing from build
  logs (HEADER/FOOTER timestamps), generate sorted duration table,
  phase breakdown, parallelism timeline, and CSV export.

- scripts/build-dep-graph.py: Parse rules/*.mk dependency graph,
  compute critical path, fan-out/fan-in bottleneck analysis, and
  generate DOT/JSON output for visualization.

- scripts/build-resource-monitor.sh: Sample CPU, memory, disk I/O,
  and Docker container count during builds for resource utilization
  analysis.

Add "make build-report" target to slave.mk that runs the timing
report and dependency analysis after a build completes.

Example output from a VS build on 24-core/30GB machine:
- 210 packages built in 53m wall time (173m CPU)
- Max concurrency: 5 (with SONIC_CONFIG_BUILD_JOBS=4)
- Critical path: 14 packages deep (libnl -> libswsscommon -> utilities)
- Top bottleneck: LIBSWSSCOMMON with 48 downstream dependents

Signed-off-by: Rustiqly <rustiqly@users.noreply.github.com>

* Address Copilot review: fix 17 bugs in build analysis scripts

- Use free -m with division instead of free -g to avoid rounding (#1)
- Add = and ?= to Makefile dependency regex patterns (#2, #7)
- CPU calculation now uses /proc/stat delta (two reads) (#3, sonic-net#14)
- Fix misleading 'critical path estimate' comment (#4)
- Fix parallelism timeline comment (60s not 10s) (#5)
- Include after-relationship packages in fan stats (#6)
- Guard disk I/O division by zero when INTERVAL<=1 (#8)
- Remove unused elapsed_line variable (#9)
- Remove redundant LIBSWSSCOMMON_DBG check (#10)
- Remove active_make_jobs from CSV header comment (#11)
- Wire up _RDEPENDS parsing to build reverse deps (#12)
- Remove unnecessary 'if v' filter on rdeps JSON (#13)
- Remove unused REPORT_FORMAT parameter (sonic-net#15)
- Add cycle detection to critical path algorithm (sonic-net#16)
- Add execute permission check for companion scripts (sonic-net#17)

Signed-off-by: Rustiqly <rustiqly@users.noreply.github.com>

---------

Signed-off-by: Rustiqly <rustiqly@users.noreply.github.com>
Co-authored-by: Rustiqly <rustiqly@users.noreply.github.com>
croos12 pushed a commit that referenced this pull request Mar 25, 2026
…dating udevd rules (sonic-net#26343)

- Why I did it
On SONiC SmartSwitch platforms with DPUs, systemd-udevd crashes with SIGABRT on every reboot when DPU firmware initialization is slow. During the initramfs boot phase, a standalone systemd-udevd daemon is started to handle device discovery. If DPU firmware takes longer than the 60-second udevadm settle timeout (BlueField-3 DPUs can take 120 seconds each in the failure case when they are stuck), the initramfs cannot stop this udevd before switch_root. The stale process survives into the real system but is never chrooted into the overlayfs root, leaving it with a broken filesystem view. When dpu-udev-manager.sh writes udev rules, the stale udevd detects the change and crashes on an assertion in systemd's chase() path resolution (assert(path_is_absolute(p)) at chase.c:648), because dir_fd_is_root() returns false for a process whose root still points to the initramfs rootfs rather than the overlayfs.

This triggers a systemd issue : systemd/systemd#29559 which maintainers doesn't consider as a bug from systemd side. Raising this fix for our usecase.

Core was generated by `/usr/lib/systemd/systemd-udevd --daemon --resolve-names=never'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007f29fe7f695c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
(gdb) bt
#0  0x00007f29fe7f695c in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#1  0x00007f29fe7a1cc2 in raise () from /lib/x86_64-linux-gnu/libc.so.6
#2  0x00007f29fe78a4ac in abort () from /lib/x86_64-linux-gnu/libc.so.6
#3  0x00007f29fea50c11 in ?? () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#4  0x00007f29feb94a8b in chase () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#5  0x00007f29feb956e2 in chase_and_opendir () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#6  0x00007f29feb9a609 in conf_files_list_strv () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#7  0x00007f29fea913e8 in config_get_stats_by_path () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#8  0x0000559f295519cf in ?? ()
#9  0x0000559f29553a77 in ?? ()
#10 0x00007f29fec36055 in ?? () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#11 0x00007f29fec3668d in sd_event_dispatch () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#12 0x00007f29fec394a8 in sd_event_run () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
#13 0x00007f29fec396c7 in sd_event_loop () from /usr/lib/x86_64-linux-gnu/systemd/libsystemd-shared-257.so
sonic-net#14 0x0000559f29545820 in ?? ()
sonic-net#15 0x00007f29fe78bca8 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
sonic-net#16 0x00007f29fe78bd65 in __libc_start_main () from /lib/x86_64-linux-gnu/libc.so.6
sonic-net#17 0x0000559f29545c51 in ?? ()

- How I did it
Added a kill_stale_udevd() function to dpu-udev-manager.sh that runs before writing the udev rules. It identifies the systemd-managed udevd PID via systemctl show, then kills any other systemd-udevd --daemon process that doesn't match -- these are leftover initramfs instances. If no stale process exists (e.g. DPUs are healthy and the initramfs udevd exited cleanly), the function is a no-op.

- How to verify it
Deploy the image on a SmartSwitch with DPUs in a state where firmware initialization times out (>60s per DPU) by stopping image installation before firmware install step
Reboot the switch
Verify no new systemd-udevd coredumps in /var/core/
Verify the stale process was killed: journalctl -b 0 | grep dpu-udev-manager should show killing stale initramfs udevd PID (systemd udevd is PID )
Verify systemd-udevd.service is healthy: systemctl status systemd-udevd should show active (running)
Verify DPU udev rules were written: cat /etc/udev/rules.d/92-midplane-intf.rules should contain the DPU interface naming rules

Signed-off-by: Hemanth Kumar Tirupati <tirupatihemanthkumar@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants